{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Evaluating Classifiers\n",
    "\n",
    "Goals:\n",
    "1. How to evaluate a machine learning model\n",
    "2. Understand overfitting\n",
    "3. Understand cross-validation\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "\n",
    "Machine learning is different than other types of engineering in that every improvement you make will typically make your model better in some cases and worse in other cases.  It's easy to make choices that seem like they should improve performance but are actually hurting things.  It's very important to not just look at individual examples, but to measure the overall performance of your algorithm at every step. \n",
    "\n",
    "Before we make any modifications to the classifier we built, we need to put in place a framework for measuring its accuracy."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## First Attempt\n",
    "\n",
    "We have 9092 labeled records.  We can try running our algorithm on that data and seeing how many it can correctly predict."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Percent correct:  79.5094588649\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "df = pd.read_csv('../scikit/tweets.csv')\n",
    "\n",
    "text = df['tweet_text']\n",
    "target = df['is_there_an_emotion_directed_at_a_brand_or_product']\n",
    "\n",
    "# Remove the blank rows:\n",
    "fixed_target = target[pd.notnull(text)]\n",
    "fixed_text = text[pd.notnull(text)]\n",
    "\n",
    "# Perform feature extraction:\n",
    "from sklearn.feature_extraction.text import CountVectorizer\n",
    "count_vect = CountVectorizer()\n",
    "count_vect.fit(fixed_text)\n",
    "counts = count_vect.transform(fixed_text)\n",
    "\n",
    "# Train with this data with a Naive Bayes classifier:\n",
    "from sklearn.naive_bayes import MultinomialNB\n",
    "nb = MultinomialNB()\n",
    "nb.fit(counts, fixed_target)\n",
    "\n",
    "predictions = nb.predict(counts)  # predictions is a list of predictions\n",
    "correct_predictions = sum(predictions == fixed_target) # correct predictions is a count \n",
    "print('Percent correct: ', 100.0 * correct_predictions / len(predictions))\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The naive bayes algorithm gets 79.5% accuracy.\n",
    "\n",
    "Does this seem like a good way to check the accuracy?  It shouldn't!  We tested our accuracy on the same data we used to fit our model.  This is what is known as testing on training data and it's a cardinal sin in machine learning.\n",
    "\n",
    "Lets try splitting our data.  We'll train a model on the first 6000 tweets and then test it on the remaining 3092 tweets.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3092\n",
      "Percent correct:  66.3971539457\n"
     ]
    }
   ],
   "source": [
    "# (Tweets 0 to 5999 are used for training data)\n",
    "nb.fit(counts[0:6000], fixed_target[0:6000])\n",
    "\n",
    "# See what the classifier predicts for some new tweets:\n",
    "# (Tweets 6000 to 9091 are used for testing)\n",
    "predictions = nb.predict(counts[6000:9092])\n",
    "print(len(predictions))\n",
    "correct_predictions = sum(predictions == fixed_target[6000:9092])\n",
    "print('Percent correct: ', 100.0 * correct_predictions / 3092)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overfitting\n",
    "\n",
    "Our accuracy measurement went down a lot - from 79% to 66%.  \n",
    "\n",
    "Two important questions to ask yourself:\n",
    "1. Why is testing on training data likely to overestimate accuracy?\n",
    "2. Why didn't our algorithm get 100% accuracy?\n",
    "\n",
    "We just checked the accuracy of our algorithm on the data it was trained on.  The algorithm could have memorized every tweet and just spit back what it memorized and gotten 100% accuracy.\n",
    "\n",
    "For example, here is a model that has high accuracy on the training data and doesn't generalize well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x113504518>]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAD8CAYAAABjAo9vAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJztvXmUY1d17/85mks1Tz0P5aGxDcZuQ9tgphgDiTEBO0AIUCEkwCt4AR6s8MszUCshJPQLJC8hJGFBHAwxUAFeGAKPQIjBJnlmsN222267PfZQ5R5r6K5RpfGe3x/3nqsrlapKqpJKV1X7s1avlq6kqy3V1ffu+z377KO01giCIAjrh0C9AxAEQRCqiwi7IAjCOkOEXRAEYZ0hwi4IgrDOEGEXBEFYZ4iwC4IgrDPKFnalVFAp9aBS6vvO/QuUUvcopZ5WSn1DKRWpXZiCIAhCuVSSsX8AeMxz/1PAp7XWFwPngXdWMzBBEARhZZQl7EqpHcBrgC849xVwPfBN5ym3AzfXIkBBEAShMkJlPu9vgP8JtDr3u4FJrXXWuX8C2L7cTnp6enRfX1+lMQqCIGxo7r///nGtdW+5z19W2JVSvw6Maq3vV0pdV2lASqkBYABg165dHDhwoNJdCIIgbGiUUsOVPL8cK+bFwOuUUseBr2NbMJ8BOpRS5sSwAzhZ6sVa61u11vu01vt6e8s+4QiCIAgrZFlh11p/RGu9Q2vdB7wZuFNr3Q/cBbzRedrbge/WLEpBEAShbFZTx34L8AdKqaexPffbqhOSIAiCsBrKHTwFQGv9U+Cnzu2jwDXVD0kQBEFYDTLzVBAEYZ0hwi4IgrDOEGEXBEFYZ4iwCxVjWZpv3n+CqUSm3qEIglACEXahYn7wyGn+v395iC/cfbTeoQiCUAIRdqFijozOASDroAuCPxFhFyom6Bw1lii7IPgSEXahYtI5W9Czlgi7IPgREXahYlKZHABzqewyzxQEoR6IsAsVk3SEPZHO1TkSQRBKIcIuVEwyYwGSsQuCXxFhFyommbUz9UzOqnMkgiCUQoRdqJiUk7FncjJ4Kgh+RIRdqBiTsaclYxcEXyLCLlSMsWDEihEEfyLCLlSMsWCyYsUIgi8RYRcqJudMTJKMXRD8iQi7UDFZR9DFYxcEfyLCLlRMVjJ2QfA1IuxCxRhvPZMVj10Q/IgIu1AxGcvO1LOWZOyC4EdE2IWKMYOn6awIuyD4ERF2oWJcK0bKHYUNztDQEH19fQQCAfr6+hgaGqp3SACE6h2A0HgYC0YGT4WNzNDQEAMDAyQS8wTi7QwPDzMwMABAf39/XWOTjF2omKxnoQ0tqygJG5TBwUESiQRt1/4mO9//VYIt3SQSCQYHB+sdmgi7UDneTF3sGGGjMjIyAkDzpS8FILL1WQXb64kIu1AxOc+SeGLHCBuVXbt22Te0/RsItfUUbq8jIuxCxWQsTTRkHzoi7MJGZf/+/cTjcUABEIi1Eo/H2b9/f30DQwZPhRWQszStsRCprCVtBYQNixkgHfyFvZJYx6Zt7L/11roPnIJk7EKFaK3JWZqmcBCQDo/Cxqa/v5/23q0A3Pymt/pC1KEMYVdKxZRS9yqlHlJKPaqU+riz/Z+UUseUUgedf3trH65Qb0yfGCPsYsUIGxnL0swkMwBMJ/2zBnA5VkwKuF5rPauUCgN3K6V+6Dz2h1rrb9YuPMFvmIHTmAi7IJDKWphagmQmV99gPCwr7NouVJ517oadf3L9vUHJC7t9sZeWRmDCBiaVzYv5vI+EvSyPXSkVVEodBEaBO7TW9zgP7VdKPayU+rRSKlqzKAXfkHMmJEVDkrELgrdf0ny6UNjr2W6gLGHXWue01nuBHcA1SqnLgY8AlwJXA13ALaVeq5QaUEodUEodGBsbq1LYQr2wijJ26fAobGRSHmH3WjGm3cDw8DBaa7fdwFqJe0VVMVrrSeAu4Aat9WltkwK+BFyzyGtu1Vrv01rv6+3tXX3EQl0p9tjFihE2MsaKaYmGCqwY026gZe+raX7OywHWtN1AOVUxvUqpDud2E/Aq4HGl1FZnmwJuBh6pZaCCPzBWjAyeCgIkM/bx394ULrBiTFuBtmteT3zPtQu215pyMvatwF1KqYeB+7A99u8DQ0qpQ8AhoAf4RO3CFPyC6fklM08FIW/FdMTDrsiD3VYgEGsh3LmV1KknCravBeVUxTwMXFVi+/U1iUjwNcaKyQu7WDHCxsVYMR3xMOmcRTZnEQoG2L9/P//9lo8DkJ2xxxbXst2AtBQQKsIVdjPzVAZPhQ2Mm7E3RQC75LE1GKC/v5/jc0H+8Sjo+Rl2797N/v3712xmqgi7UBGWLszYpaWAsJFJGY89HgYcYY/Zt5//ouv4x6P3c+/dd3H59vY1jUt6xQgVsdCKkYxd2Li4VkyTLebJdP73MDWfth9zRH8tEWEXKsIqmqCUtSRjFzYuxoppb8pn7IaEUyXTHFl7Y0SEXagIk6BHXCtGMnbBf6zVrE9vVQwUCru53RQJ1uS9l0I8dqEipCpG8Dv5RaYTADVdZDrliHe7GTz11LKb8kfzW1lLJGMXKsK1YqSlgOBTzKxPAkFiu68Eajfrszhj97YVSGZyxMIB7Dmca4sIu1AR+YzdzDyVjF3wF2Z2Z8dLf5vNb95PZNulBduriRH21ljIue+xYtI5d92CtUaEXaiInJQ7Cj7HzO6Mbr8MgMjmiwq2V5NUNkckFHAF3NsUbD4jwi40CKa7YzgoVozgT9xFpp0kJNTaXbNZn6mMRTQUcCfspTKFwh6rw8ApyOCpUCHGigkGFOGgEitG8B3uItN3zwPQvml7zRaZTmUtoqGgewWb9FgxSbFihEbBWDEBpQgFAlLuKPiS/v5+urbtBuBVr319zabyp7I5O2N3hN2bsSezObcL6lojwi5UhHFeggFFKKhkgpLgW8wi05OJdM3eI5W1iIYDbjGBDJ4KDYnJ2IMB22eXlgKCH8nkLNcmnEvVbi1S22MPEg4qAqp48NSSjF1oDMzgqW3FKKmKEXxJsmBqf7Zm72OsGKUU0VBwwVJ59Zh1CiLsQoVY2jt4GhArRvAl3qn9c+naZezprOX669FwoOCEYlsx9ZFYEXahInLejD2opNxR8CWmy2JHPEwiVcuM3XJLHaOhwIJyR/HYhYbAm7GLFSP4FZOxdzdHSGRyroVYbVLejD0ULBw8rWMduwi7UBE5T1WMDJ4KfsUV9pYoWhfWl1cT47GDk7E7HrtladJZi1hIhF1oAArq2KXcUfApZsC0tyUK1K4yxlTFAMTC+cFTcyKRwVOhIbAsrxUjGbvgT5Juxm63061VZYypYweTsdvva9r3iscuNARuSwFltxQQj13wI/PO4Gl3c40zdq8VEw64PdjdRTZE2IVGwFgxSmG3FJCqGMGHzK9lxh4yVTH5wVNzxSCDp0JDUGDFSBMwwacYYe9xhL0Wtexa68I6dk+5o7likIxdaAhyCyYoScYu+I9kOl8VA9Sklt0MlBqPvdTgaUwmKAmNgLQUEBoBk7F3NdcuYzciHgnK4KnQ4BT2Y5eqGMGfzGdyhIPKXbLOO9W/WhgR9848LR48lSZgQkNgEvSg1LELPsa0zDUZc02E3RHxfFXMwsFTqWMXGgLXigk4VTFixQg+ZD5td1Y0GfN8Da2Y4pmnWmuxYoTGonDwVIkVI/gS04ArHAwQDioStbRiQnkrRmvI5LT/69iVUjGl1L1KqYeUUo8qpT7ubL9AKXWPUupppdQ3lFKR2ocr1JuF3R0lYxf8x3wmvyxdLBysbcbuqYqxt+dcr93PVkwKuF5rfSWwF7hBKfVC4FPAp7XWFwPngXfWLkzBL0hLAaER8C5y0RQOro3HbtY9zVpuxm62rTXLvqu2mXXuhp1/Grge+Kaz/Xbg5ppEKPgK14qRlgKCj/GuN9oUCRYsvFEtFlox+YHaZCZHLGyvrFQPyjqdKKWCSqmDwChwB3AEmNRam6r/E8D22oQo+AnjvAQCipBMUCrJvcfOcdvdx9BaTnr1wrvIRVOtrRhPrxizvZ4LWQOEynmS1joH7FVKdQDfAS4t9w2UUgPAAMCuXbtWEqPgIyxLEwzYWUg4YLcU0FrXLTPxIx/59sMcGZvjxRd3c+mWtnqHsyHxLnIRC9cqY7ec/RdZMRmrrqsnQYVVMVrrSeAu4FqgQyllTgw7gJOLvOZWrfU+rfW+3t7eVQUr1J+c1gQdEQ85M+5yMoBawJGxOQBOnp+vcyQbl2Q6RzxcW4897WbsjhXjGTyt5+pJUF5VTK+TqaOUagJeBTyGLfBvdJ72duC7tQpS8A+WpQk4R00oaAu8VMaU5tSkCHu9SHgHT2vusS8cPE3VOWMvx4rZCtyulApinwj+j9b6+0qpw8DXlVKfAB4EbqthnIJPyFn5jD3sKHwmZ9Vt6rTf8Prq5+YydYxkY1MweForjz1TlLF7Bk+95Zb1YFlh11o/DFxVYvtR4JpaBCX4l5zWBALGinEydqmMcUl4BGQ2JcJeDyxLk8paBXXspq68mhTXsReUO6ZzxCNlDWHWBJl5KlSEd/DUeOwZqYxxmfW0h51J1mZxB2FpitcbbYoEamrFmO6OMW9VTKa+V7Ei7EJFeAdPwwHJ2IvxirkIe30o7tNSy3LHSDDgXsEaKybl1LHXa9YpiLALFZKz8Fgx9uEjwm4zNDTEK294jXv/yWMjdYxm41Lcp6XJKXes9ryCVMYqmFlaXMceq9OsUxBhFyrE8g6eOh67WDG2qA8MDHB2YhIAnU1z+KljDA0N1TmyjUfxeqPmf+OJV4tUNueKOXgy9qzFXDpLc1Q8dqFByGmPxx6QjN0wODhIIpGAkN0LLzc3iQ7HGBwcrHNkGw+z3qipY4/XqHWvdyFryA+eJjM55lJZWkTYhUbBsjRmkqmpipFGYDAyYtsuKhgGIJeYJBBtdrcLa8d8pnjwNFiwvVqkskVWTCiAUjCZSGNpiEfFYxcaBG/GbqoB0iLsbrsMFTLCPkUg0iRtNOpAIm0PWnvLHaEGwp7JEfEIu1KKlkiIM9MpAMnYhcbBO0HJHNTpKnuXjcj+/fuJx+Nuxm4lJglE4/zZJ/bXObKNR7LE4CnUyooplNCWWIiz00kAmutYx16/dxYaEsszQUmEPU9/fz8Ag1/8IQAtIXvc4XWvf1PdYtqoLGbFVLtfTCqbK/DYwc7SXWGXjF1oFLwZe1SEvYD+/n4+9qd/BsBH/uB/ADCXllr2tcYMni7I2GvhsYcXZuxnpmxhFytGaBi8dexuxi4eu4spqetstqtjEiLsa05xHXutFrQurmMHW8zNMSCDp0LDoLXGGTPND55Kxu5ivovOuC3ss6nqz3gUliZfx24fn7WqikmWsGJaY/ksXTJ2oWHwthQQj30h6axFMKBoc37giZRk7GvNfDpHQOUTD5O5V91jL9EPxivm4rELDUPOWjh4mhIrxsUeUAu4P+pZEfY1Zz5jd1Y0q3rVqirGrGvqpSUazt+W7o5Co2B5MvZo0P7B1CJjHxoaoq+vj0AgQF9fX8NMzU9nLSIeYU/UoPmUsDSJdGEv9LwVU93jtNTydy0eK6aeHruUOwoVUSpjr7awm74riUQCgOHhYQYGBoB8WaFfSefsjn/NjphIxr722J0VC2eEQnU9dq21k7EXeezOCT0YUISD9cubJWMXKsKyqLnH7vZdAULtmwFIJBIN0XcllbFL4PIZuwj7WuNdPQnsGaHVXvc0nbOwNAusmI64bcWY2dn1QoRdqAhvS4FgQBEMKNK56toNpr9Ky95Xs/09txG/7GUF2/1MysnYjbBIVczaU8oiaYpUtye7WZGpOGO/oKcZqH9BgQi7UBFeKwbsyoNqH8Smv0p8zwvs/5/1ooLtfsb22IMEAormSFCqYupAqfVGTU/2apEyJZVF73PJlla6miN88vXPrdp7rQQRdqEi7MHT/P1IqPrCbvquhHv77Pfo3U08Hmf/fv/3XfH2D4lHQzLztA4k0lniRasXxcLVXR6veBKUoTUW5oE/ehVvvqa+SYgMngoVkfOseQqOsFe53LG/v5+0BR9/tAOAcMdWPv8Pt/p+4BQgnc13/GuJhpgTK2bNSaRzxItqyJsiQZJrYMX4BcnYhYrIWZqAKrRiqr0yDcDLbrgJgOft6oBgiBtuemPV36MWpL0ZeyTInFgxa04ilXOrkgzVtmLmXSvGnxLqz6gE32Lpwow9WgMrBmDU6Wl9xQ47azeNlfyO14ppFiumLthWTGHGHquysBe3BvYbIuxCRSwYPK2VsM/YQn7lznYATk/NV/09aoGZoATQHAmKFbPGaK1tK6ZUxl5VK8beV1SEXVgPWDpfxw618dgBxmbsjP25252MfboxMnYzQQkkY68H6ZxF1tIL+rQ0Rapbxy4Zu7CuWDB4WoNyR4DRmRSRUIALe5oJBxWnG8WKyeQXOG6OhMRjX2NMVl4yY6+qsJvBU39KqD+jEnzLgsHTmnnsSTa1RgkEFD0tUTeD9zvpnMeKiYZIiBWzpswtIuyxKlsx84vUsfsFEXahIixPP3awB09rURUzNptiU2sUgN7WBhJ2r8ceDTKXzqK1rnNUGwczIax48DQeCbpZdjVIirAL6wlLF2bs8Uio6gsYgF0Vs6k1BkBvA2Xspm0v2Bm7pamqoAhLY7ppNkcXWjHpnEW2SuNBxmIrfh+/IMIuVIR3aTywM6FaTJsfnUmxqc2Tsc/6X9gtS5PJ6YKqGJAOj2uJGawuztjdBa2rdHU5k8oSCQUWrKDkF5YVdqXUTqXUXUqpw0qpR5VSH3C2/4lS6qRS6qDz78bahyvUG28/djCVH9Vf/X1qPkNvS17YJ2ZT5Cx/WxqmOsjrsYN0eFxLzJhGKY8dqve3mE1m3Ra9fqScyLLAh7TWDyilWoH7lVJ3OI99Wmv9v2sXnuA3iqtizOxKrbW7Ys1qGZ9NA7agA2xqjWJpmJjL2zN+xIw1mCzOZI2Ssa8dicziVTEAyXR1MvbZVLZgUQ2/sWzGrrU+rbV+wLk9AzwGbK91YII/KRb25miIrKWrWstu/PRez+Cpd7tfMdVB3l4xgExSWkOm5zOA3YzLixH6as0rmE1m67pY9XJU5LErpfqAq4B7nE3vU0o9rJT6olKqs8qxCT4kZ2lCRRk7UNWyvoYVdufkFnXKhsyK9TPJTN1i2mhMOcLe3lQo7G3OfSP8q2UmtU6EXSnVAnwL+KDWehr4HHARsBc4DfzVIq8bUEodUEodGBsbq0LIQj0pbinQ7NgN1ZxhuUDYW2IF2/1Kyp1mbv+szGo65xMi7GvF9HyGaCiwoAzRCP1klYR9Npl1T9x+pCxhV0qFsUV9SGv9bQCt9VmtdU5rbQH/CFxT6rVa61u11vu01vt6e3urFbdQJ3K6KGOPmkGp6mXs404FTHezLew9rREA31fGuIOnQSPsdtyTiXTdYtpoTM1nFmTrkD/JTlVL2Bs9Y1f2iNhtwGNa67/2bN/qedpvAI9UPzzBT2itF8w8dTP2Kg4Qjs2k6IiHXa86HgnREg35PmMv9tjbYiGCAcWkZOxrxmSitLCbbVNV+lvM+XzwtJzIXgy8DTiklDrobPso8Bal1F5AA8eBd9ckQsE3mHLDkh57FTP2sZmUW+poaITZp8VVMUop2pvCnJeMfc2Yms+42bmXlqhzkp2vzt/C9tgXvo9fWFbYtdZ3A6Xq2H5Q/XAEP5NzpsYHiqpioMoZ+2zK9dcNjTD7tDhjB9sCqJavKyzP1HyGbR0LS2LNSbYaVkwqmyOdtWjx6axTkJmnQgWUytjzk3Cqm7H3lMrY/e6xlxD2znik6h770NAQfX19BAIB+vr6GBoaqur+G5mp+YxbAVNMR1O4KrbYhDPPorvoGPUTIuxC2RhhL6xjt7OWapX0WZbmzFSSrUVZV29rlLFpfwt7KutUxXgz9iqJiWFoaIiBgQFGTp1Fa83w8DADAwMi7g7TiwyeArTHq5Oxu1VbIuzCeqCUsHfFIyhVvVLEszNJ0jmLnZ3xgu29rVFmUllf9zdPlcjYu5ojbpVPNRgcHIRdz2fnB79B728MApBIJOztG5xEOstMKrvAxjN0xSNVOU7N33Ox9/EDIuxC2ZQS9lAwQHdz9WySZ87ZS+Dt7CoU9t3d9v3hiURV3qcWuFaMp6/xto4mRmdSVetZP/LMCTpf/g5UIEj8WdcS69trbx8Zqcr+G5lTk/ZiLNs7mko+vq2jqSoLthTPs/AjIuxC2ZQSdrB7uYxWySZ55pwt3Ds7C3+cfd3NAByfmKvK+9QCtyrGs6rO9s4mtK7eYty7rvk1Qm29jH3vL8jNnqf1qtfY23ftqsr+G5lTk3ZSsLW9tLBv72xiaj6z6t49Rti7WyKr2k8tEWHfYGRyFgNfPsA7/uk+MhX2dzFVMcGiZl+b2qKMVsmKGT6XQCn7R+ilr8cW9mPj/hV2k5VHg/lqiR1O9nhisjpXGlfe9C6s+RkST/ycucM/pemifTR3bWb//v1V2X8jY4S9VFWMvb2p4HkrZWzWnmfh15a9IMK+4fj2Ayf4j8NnufPxUe56fLSi12ZzpTP23pYoozPVyUgPn5ri4t6WBT+almiITa1RjozNVuV9akFx217In6BOnl+dmIC9as/TiSau2RZh987tzD16JyoY5vc+9nf09/evev+NzqmpJErB5rbSwr7dEfzV/i1OnJ9nyyLv4RdE2DcQWmtu//kwvdEcZNO85QMfq6hcztKlhX1zW4zx2XTFVwClePjEFM/d3l7ysSt2dPDgyOSq36NWpDILhX1rexORYICnRld/QvrpE2PMpXN88PUv4/jx46TOHuWyrW2MBLetet/rgSNjs2zvaCIcLC1ru7rsq76nRmdW/T4XbWpZ1T5qjQj7BuKeY+c4fHqao//+RZJnjxDedEFF5XLf+e73APidt/12wQnh4k0t5CzN0bHV2STPnEswOpPiuTtKC/vVfZ0cG5+r2tVBtUnncoQCquDEFwkFuGJHO/cdP7fq/X/rgRN0N0d44YVd7rY3PG87Dz0zydNVOHE0OodOTHH5ttLHDtiDnbu74xw4fn7F7zGZSDM8keCyLa0r3sdaIMK+gfjbnzwF89Ocf+DfyZ4/TahjC1BeudzQ0BB//McfA0BbuYITwhWOEN97bGJV8X334EkAXnnZ5pKPv2RPDwD/96HTq3qfWuFdyNrL1Rd0cejE1KomKj16aoofP3aW/hfsIuTJSF+3dxvBgOKrvxxe8b7XA0fGZhk5lyg46ZXi6r4u7j1+zl2MulLufnocgGsu6F7R69cK/3axKROtNeOzac7NpbG0xtIarUEpUCj7f+9t7PsU3FcEnOdsaouWvfK41pqz06mC/hOxUJDtnYtfDhaTzOQ4M5V0Z256441HgmzraFpgfayE+46f4+dHJjj/i/+DzqbITp4m+JzrIBiGXGbZcrnBwUGSaTsObdmxmhPCsbe+lT2bWrjt7mPs2dzK5dvby+58l7M0h05O8bOnx/n8fx7lpXt6FpQ6Gp6zrZ2r+zr5+zufoqclws6uuDsLVnm6XnjHdpWCgFIFx0Cg6O9f/LhSsKUtViCg5ZDKWgWTkww3793O5//zCB/+1iF+98V9xMJBAk5cOUs7x61tdVmWfVtrTc7ZPj6T4tM/fpKelii/9+ILCva9qTXGb129k9t/cZy2WIgXXNhNczTkHufmM3nRnhUGNXqJx4pfp5d4rOBeBfssvf/i5xY9VBB3ztJ87qdHiAQD3PjcrSzF65+3nW/ef4KPffdR3vD8HYSD9hVWQKmCv4M2tz1/j2Q2x2d+/BTbO5p4/m5/Lz/R0MJ+cnKe3//q/Tx0Yqpq+4xHgnzstc/mt65eunxsdCbJu24/wMMl3rsjHuZTb7iCX3vOliX38bV7R/jE9w8vuWbo5rYof/2mvbz44p5FnzM2k+Jr945w1a4OXrqndGvkz/zYFkMmHmEayJ4/jVIBQh2byU6cWLZcbmRkhFCvIyqWVbBdKcWf3Xw577r9AG++9ZcEA4rffVEfgzdeVtBXpphHTk7x34fud2vXr9zZwSffcMWScfzFG6/kt79wDx/4+sEln7daupoj/M1v7eVlzyq/1fRiGfslW1r5g1c+i7+640n+/dEzK4pnR2cTX/idfXQ2Lyyx+6PXPJupRIa/vfNpuPPpFe2/0QkHFZ+4+XI2LTOo+aKLenj7tbu5/RfDfOPAMxW/T3MkyOff9vyqJFu1pGGFXWvN73/1fo6Oz/HRGy9le0ecYAB33U37DG9n79q5rwvua/d5Go1l2eV833ngJB/59iH27uzkkiV8tI986xBPnZ1139v8nWdTWb78i2He/7UHufNDv8KOztLZ5+FT03z0O4e49sJuXv+8HW5DIRMf2H0vvnj3MQa+fID/d8v1dJX4UVuW5l233+ee3D771ufxmisKs5YHR85z99PjfPTGS2m+4E8YGBggM2kLTLh9C5H5c0uWyz18YpJd1/46p48+7rxp/kRkTggvvLCbn91yPfePnOOHh85w293HuGRzK2+6emfJfWZzFu/+yv3kLM2nf+tKXrqnd0F/mFJc0NPMT//wOh47Pc3EbNq9QjOYm1pr9+8OJuvKHwOW5+9v9mGOi6yl+dLPjvHef36Au2+5ftEp6sUsJuwA73/FHn5z306eGp0hm9Nuph4MKAJOxmiyePd2wP4/HglxyebWRU+STZEgn+1/HoOT8xyfmCOZydmfx/OZitej9d4rzugLrniK+/+pkjed16klHlt8n2qJfVLm6y7e1FLW8QPw8Zsu510vvZBj43PkrKK/hVLuVZ75O5gr+mBAccmW1gXL7vmRhhX2nx+Z4KETU3zqDc9dNruuhFddtpkX/PlP+Pp9I3zstc8p+ZwzU0nufGKU9738YgZedtGCx6+9qJuXfOouvnvwFO99+cUl9/HlXxwnFgryuf7n016izajh6r5OXvnX/8VXfjHMB165Z8HjP3l8lIdOTPG/fuO5fOO+ET7+fx/lVc/eXCAw/3L/CZrCQd76gt20OPEOfuIvAdjUdwn7//SDS5bLve7vfwYvfTex07YPb6+tAvF4vOCE0B4Pc/2lm3n5JZs4fHqar/xyeFFhv/PzGf4xAAAbcElEQVTxUU5OzvMPb3v+slc2xYSDAa7Y0VHRayrlOdvaeN3f/4wfHDrNW64p7/iyrZjFbbwt7TG2tNeuTG5bR5Nbqy0szc6u+KKW33qgYQdPf/ToGeKRIDdfVd11tTubI7zwwm5+cWTxgcC7nhhFa7hpb+kysx2dcfbu7OBHi1x2a62564lRXnHZpiVFHeDiTa1c3dfJjx87W/LxOw6foS0W4k37dvAHv3oJozMpfnAoP7iYzlr84NBpfvU5m13fu7+/n6cfeYCAgvff8kdl10D/zw9/2PkAFrt37+bWW28t+VqlFL/2nC08cmrxAcM7Hx+lvSnMKy7dVNZ7rzXP3d7O1vaYO1hWDqmsVdBOQBDqRcMehf/vqXFecEFXTWZ/XbG9nadGZxcdOT98aprWaIgLexavZX3JxT08emra7fjn5dRUkrPTKfaVOQDzkot7eeTUFOfnForkfz05zkv39BIKBnjZnh52djXxrQdOuI//55NjTCYy3Ly38AQYCgbobY1WNNX9Jde9AoCf/PgOjh8/vuQJ4YUXdqM13HusdJnfwWcm2buzo+IByrVCKcXl29t58kz5Nc/p3OJWjCCsJQ15FE4lMhwbn+MFF9am5Ojy7e3kLM3h09MlHz98eprLtrYtOTB46dZWcpYuWV980Jlkc9Wu8oT9BRd2oTU8dKJwcs5kIs2Z6SR7d9q2hFKKm67czs+eHndrvf/14Em6miNuqaCXLW0xzkwvLezeSoUJp9FXKLD8YXPFjnaUouR3mEhnefLsjBu3X9mzqYVj43NlT7xKZXIi7IIvaMij8Mi4LZYX99Zm9telzqBpqQk3WmueODPDpVuXnqBg9vH46YUZ3zEn/j2by4vf7OvJs4X7Mn1TTB8VgJuv2oal7VrvmWSGHx8+y2uv2Fqy/HJzW2zZjN00/gI451wxlJNkx8JBtnc0leztcnRsDkvDZct8h/Vmz+YWspbmeJn9aVJZq+xSWUGoJY0p7E4WXKtpvWaAq1SzoJlUltlUdkG/8GL6upsJBxVPl+htMnIuQW9rlHikvLHrjniEzW1RHi+yBUynwwt68rFcvKmVbbEsf3b7D9n2gl+3xWa09DrjW9uXz9jTnmx1LmXbSsEyMnY7ruaSwu52cPT54JWx2sppPDY0NMTBhx/h37//PVnVSKg7jSnsY3OEg2pBa9dqEQsH6WmJlBR2k+EuV90QCgbY3BbjdIl9DE8k2F2hqD1rc2uJjD1BQBUK5NDQEE/eMQTdu+l61XtIjx7jEx98Z0mh2dweYya59OIV3j7isyl79Zni7o6LcUFPM8fG5hZMPBlpEGE3zaTGZ5eeMWpWNcpohc6mZFUjoe40qLDP0tfdXNOBt20dTZwqYVPkez4vX7a2rb3Jbf7vZeRcgl0VitpFvS0LRHJ4Yo5tHU0FA8iDg4Ocf/BH5OYmUaEIUz//+qItA8xnWCpr92bss27GXp6w7+qKM5PKLliO7JnzCdqbwrT5vB7Y9NterjfN4OAgiUQCFYqgs/ZJQFY1EupJQ9axHxmbZU+Nu6tta28qaaOUm7EDbO2Icf9wYcOhnKU5O51c0G98OS7oaWYunWNsNsWmVvu9j4/PuQtQGEZGRtBac+pL7yMY7yQzdszdXozJSM9OJblokfEKb8ZuMvtyhd3sf3QmRUc8P7nqxPl5dtToaquahIMBupoXX07NTDs3360KRbAy+exeVjUS6kXDZeyZnMXIRGJRIaoWW9pjnC2RsZ92ej4bcV2KbR1NnJlKFgxAnk+ksTRlz5IzXGAWmnAGdLXWHBufo6+nMPM3M0GtuUlX1L3bvZie0kstF7YaYd/kLB12tuiKYGwm5ft+1obeluiiwv6+rz1I/xd+6X63KhRFZ/PPlVWNhHrRcMI+ci5B1tI1F/au5ggzqeyCUreJuRQdTeGyytq2tcfIWrpgMeMJx6+tdFmtC4pWEDqfyDCdzC7I2Pfv3088Xij2xTNEDVuKrJhv3DfCJ3/4eIHdk8nlb8+uMGM/W7Rs3sRsumR7BD/S21p6PdepRIZ/e/g0vzx6jg/+8Z8Tj8dR4bwVs9h3LghrQcMJe60rYgydzozQ80UzJ6fns2X3DjFZeaGwO+slNleWsW/rsBdsOOZUwuQrYgqFvb+/n1tvvZXdu3ejlFpyhmg8EqItFuLsdJJ01uKWbx3i8/95hCfP5i2ogow9bQt7qNyMvc3+jF6PWmvNxFyK7gqvWOpFb2vpjP3xM/n6/GddfR2f/fytKBWAbHrJ71wQ1oKG89iPOFbEhb3NyzxzdZgueufnMgW2y9R8hrYyhd2I1znPjNFx53ZPhRl7MKDY3R13rZjjJWrYDf39/WWLypb2GKenkjzmmUj06KkptwFaOpefOWvKHZeamOUlHgnRGgsVLHQ9ncySyemKP3+9aG8KLxj8hcIB5+MTc/S/4U38yaP/wV/95ad450suWPB8QVhLGi5jPzo2S29rtOYVFV3OYN+5omn808lM2Rm7sVsmPOVyJmOv1GMHW8SNFXN8fM4udVymnn45trQ3cXY6WTCr1bTRBUhnF1ox5WbsYGe83ozdvWJpEGFvawozm8piWYUlm94sfnQ6RdJpHRELN9xPSliHNNxReGRslgtLZKnVxs3Yi6yYqflM2SeVbmcfE3NeYU8TDKiyTw5eLuxpZvhcgpylOTaRYEdnfNVT2Le0RTk9leTgyCS9rVG6miOc9Qhx4QQlW9gDZdaxA/Q0RwtPbM53UakVVS/aYiG0tiemeRmfTRMOKi7sbWZ0Jun2FYr5eOV6YePQUMJuWc50/jVYb7BrEWGfns+WbcW0xcKEAsrNUsEefO1qjpRtZ3jp62kmnbU4NTnP8fE5dnevfoLP2MjTjE3P8407DzD22H1EcvOMemwGr8duVnmqJGPvbokUndgaL2MHmC6yYyZmU3Q3R9nSFuPsdIqks5C1tBQQ/EDDCPvQ0BAXXnENc+kct39mf81n9XU4g6fnPNmm1prp+QxtTeUNTQQCiq7mSKHHPpt2M/lK8VbGHJ+YWzBwWilDQ0N872v/BCpAuHMr5488yMhTh3ly+JT7nFINsCo5KXW3LPz8sDIrqh6YK6tin318NkVPa4TNbTHOTnsydrFiBB+w7FGolNqplLpLKXVYKfWoUuoDzvYupdQdSqmnnP9rtgigmbI9lrXF4PTh+2o+ZTsaChKPBAt+0KmsRTpnVWSjdDVHCqakT8ymVixqxoI6cPwcMyVKHStlcHCQmeFH3fupZx4lm5hh5FS+97s3YzdUlLE3RzmfSJN1ThDGlumMN0jG7thu08mijH0uTXdzlE1tUUanU8y7wi4Zu1B/ykkvssCHtNbPBl4IvFcp9Wzgw8BPtNZ7gJ8492uCmbLd/OzrsFJzpMeH12TKdnFFhLldycBtT0uUiTmvFZNesQ3R2xqlORLk35yFNC7b2rai/RhGRkbIjB0nOfwwyWceIXXyMax0gqzKfz4j7G2x/FVKJes99rRE0Nquuwfbimovcx6AHzBXZ9PzhR77xKz9d9zcGiOds9xJWJKxC35g2aNQa31aa/2Ac3sGeAzYDtwE3O487Xbg5loFOTIyQrC1m+j2y5i+9zuQy7rba0lbrFDYjc9aSca+wIqYSa144FApRVsg5ZZ8vuWGl6zqqsWeGak5+/VBzv7zhwGNlUoQasrPETCDp96VnioRdlPyaU5uqzmx1YPFMvbppD2I3uvMrj1x3q4kqsXCL4JQKRWlF0qpPuAq4B5gs9barMF2Bthc1cg87Nq1i9zMBKe/9H6mfv71gu21pL0pXPCDdjP2Cq0YYz/Mp3PMpXMrFrahoSGOHboXgOzUWYaffnxVllR+lmq+lC9oZVCRJnf2aT5j9wh7BVUxZhDafAf2oGMDCXuJwVOtNXOpLC3RkCvsphWxWDGCHyhb2JVSLcC3gA9qrQuWxdG2CuhFXjeglDqglDowNja2oiCNAOXm8g211mLKdltTmCnPJbgR+Uoy9p6WKLOpLMlMzs1aVzo5Z3BwkNkn7wEg8eQv7P9XYUmVmqX6htfdiEa5FTAmYzfCrlRlg6fms5rZtxOz6YYpdQRojRorJi/syYyFpaHZI+xmJnA8IsIu1J+yhF0pFcYW9SGt9bedzWeVUludx7cCo6Veq7W+VWu9T2u9r7e3d0VBVjJNvpq0NYUKftB5j738CbteYZtYZUXIyMgIc4/8hNNf+RDn7/piwfaV0t/fz/Hjx7Esi+PHj/Oya68BYCZpn9AyJmN3vOZKsnXI16u7GXuDWTGBgKI5EnRbFgPMOH3pW2J5YTerbVVyNScItaKcqhgF3AY8prX+a89D3wPe7tx+O/Dd6oeXp1iA1qIPR3tTuEDYzQBapRk72GV+JmNfaZ8UYz2lTz0B2lqwvRq0OCcts6hGOmcRUHZ2ChAKVibs7U1hggHFuTm7MuZ8It0wfWIMzdFQwWIkprVCSzRIazRENBRgdCZlf0+SsQs+oJyM/cXA24DrlVIHnX83Ap8EXqWUegp4pXN/XdHeFGYmlXXb7q7EY3eFfSbllj2u1GOupHPjSnGtBydjT2ctwsEATY53HKlwcRNTyz8xl+J8IoPWK7ei6kVLLMRs2ivs9u3mSAillJu1t0Tt+4JQb5b1FLTWdwOLHa2vqG44/sKtiJjP0NkcYXo+QzwSLLkw9GL0tOY7PJqSv5VaEeYqZXBwkJGREXbt2sX+/furevXSajJ2I+w5i0go4A4KRlZQ9dHt1PK7VywN5LGDLdjm+4B8z5wW5yTY2xrlxPl5sWEE3yBFt0tgLBczaFpJnxiDyc7HZ1OMziRpiYbKXsS6FLW2pIwVM+PJ2K1sms/93WcAOHPyGXp6eiqqxOlpiTIxm2J8ZmW96OtNS5EVY0TefFe9zlVZo0y6EtY/IuxLUDydvJLOjoZYOEhrLMT4bJrRmZS7qpBfMZaLmUn52JNPMX3+HHPTpiJJMzExwTve8Y6yxd22YtLuJJ7NDbJ6kqE5GnKzdMj3pW/2ZOxQ3jq4grAWiLAvQb6G2f4hT1XQJ8ZLb4u9Cs/odNIVAb9iLBfT++SBBx/CyqbdlYFQ9iGTTqfLLrPsbrFr+U3XSL+f3IppLRL2YivmWZvtpnSVWHSCUEvkSFyCBRl7BasneelpiTI+k7Izdp9nq6btrBH2RDKNzmXQGactgsofMuWWWZpa/pGJBK3RkJvpNgrFGbtrxTif4xWXbeLqvk4+8Mo9dYlPEIoRYV8Ck50bYV+Jxw52xjo+m2J02v9WTCxiHxIpp3493toGuaybsSuPsJdbZmnGGQ6fnmZzA9oVC8sdsyiVn4y0ozPOv7znRW7mLgj1RoR9CYoHT6eT5S+L52VzW4wjY3PMZ3K+F/ZIMIBS+Yx9zyWXgZVFZ03GbhdIRSKRsssszaLZD5+YYnObvz9/KVpjITI5TcpZJWk2lXNLHQXBj4iwL0FTOEg4qJiaz5CzNDPJlVkxY8cOu7f/fPBDNe8lvxqUUsRCQVfYuzdt5lkXX0RLkyPIgQDd3d188YtfLLsi56Gf/cS9/bP/+L6vP38pzKQjY8HMpbI0R2UikuBfRNiXQCnldng0P+pKM/ahoSH++R/+xr1/6omDNe8lv1pi4YC7IlAmp9mxdQvf+eY3AOjs7GJ8fLxsUR8aGuKjH3i3e//88GHff/5iWhz7zcw4nXUagAmCXxFhXwbTVmBqBS17wZ5MNHX0Ifd+ZuKZNeklvxpi4XzGns4WTlCqtPJjcHCQxOwM2ZkJAJInHvP95y+mxcnOTY8YEXbB78jRuQxtzmIbxmevpAEY2JUjWts9z7VluT1eat1LfjXEwkGSzuBpOmsRCQbcVgId8cpObOZznv3aR4jtei7pU48XbG8EWqKFGbttxchPR/AvkrEvQ9sqM3ZTOZIcfojUM4cWbPcjsXCQeadtbyZnEQ4FzJhpxas2mc+ZPX+K2Yd+tGB7I2D89FnJ2IUGQYR9GezFNrJul8dKPfa1aNxVbWLhAKlsjqGhIY4Oj/C1r36F17zoSt64Y45PveGKivbViJ+/GLd/jnjsQoMgwr4M7U0hpuYzbgOvSq2IevWSXw2xUJCRk2cYGBggqxU6l2Z4eJjP3fJ7/KsziFoujfj5izG2i6llFytG8DtydC5DW8y2Ys45nQm7VtByt7+/v6GELBYOMHziJIlEgq5gCO2sMWsGPSv9LI32+Ysx2Xm+3DHnNgATBD8iGfsytDeFyVqaE+fnaYmGNsRixbFwkLRlm+oqGEbn8ouNNNKgZ7VojhgrJksqmyOds8SKEXyNCPsymMHSo+NzdDZvjH7bsXCQSMz2xVUwDNm8sDfSoGe1CAQU8UiQ2VTWrYyRlZIEPyPCvgxmsPT4+BxdDbZAxEqJhQO0dHQRjzejgiG0ZVsQjTboWU1MT3Z39STJ2AUfI8K+DCZjH51JrXhJu0YjFg6iQhE++/l/sDfksg056FlNWqIhZlJZdwGSVvHYBR8jwr4M3rr1jbJCjpl5+vrffBMA//svPrlmC4j7lZaYk7GnJWMX/I8I+zLcfce/ube/NfSlhupxslJioSCZnHbbCkRCcpg0R+x1T2fFihEaAPnFLsHQ0BAfeu9/c+9PDD/ecA2sVkIsbB8WxnaIyMpAtMRCzuCpY8WIsAs+Rn6xSzA4OEgikXDvZ8ZHGq6B1UowDb/MbFvJ2J3B03TWrWWXjF3wM3J0LoGp2R777idpuugaUicbr4HVSjAZ+7QjYrKWpy3ss8ms2zNoJQuuCMJaIcK+BLt27WJ4eJjE43eTePzugu3rGZOxzyQlYzfYy+PlmJrPEAwoqWMXfI38YpdgPTSwWgl5K8bx2EXYaYkGSecsxmdTtDeFZVk8wdfIL3YJ1kMDq5XgCrvJ2MWKcVsInJycX9HyiIKwlogVswyN3sBqJTTJ4OkCzGDpqcmk+OuC75FfrLAAI+xmoFAy9vxMU8nYhUZAfrHCApoi9mFhhF2qYvIZezpribALvkd+scICYsUZu1gxBW1625vEwRT8zbK/WKXUF5VSo0qpRzzb/kQpdVIpddD5d2NtwxTWkmKPPSrCTk9LvrOnZOyC3ynnF/tPwA0ltn9aa73X+feD6oYl1JOmiKmKkQlKhjv/7Tvu7c/+5f5131ZCaGyW/cVqrf8LOLcGsQg+IRYSK8bL0NAQv/+eAff+2WOPbYieQULjsppf7PuUUg87Vk1n1SIS6k4goIiGAp7B0409Gcf0DMrOTACQGT22IXoGCY3LSoX9c8BFwF7gNPBXiz1RKTWglDqglDowNja2wrcT1pqmSJCcpYH8YOpGxfQGOvuNQUa/+afkZs8VbBcEv7EiYddan9Va57TWFvCPwDVLPPdWrfU+rfW+3t7elcYprDFmADUUUBveYze9gbITJ5g/cu+C7YLgN1b0i1VKbfXc/Q3gkcWeKzQmRtg3erYOG7dnkNC4LFuQq5T6GnAd0KOUOgF8DLhOKbUX0MBx4N01jFGoAzFX2Dd2tg64LSUGBwcZGRlh165d7N+/f8O1mhAah2WFXWv9lhKbb6tBLIKPMCWP0ZBk7LAxewYJjYukY0JJjBXTJH3HBaHhEGEXSiJWjCA0LvKrFUoSdzL1mFgxgtBwiLALJTH9UEy7WkEQGgcRdqEknXFb2KXhlSA0HiLsQkna4xEAggE5RASh0ZBfrVCSnhZb2Fui4rELQqMhBqpQkhsu38K7f+VCfvP5O+odiiAIFSLCLpQkGgrykVdfVu8wBEFYAWLFCIIgrDNE2AVBENYZIuyCIAjrDBF2QRCEdYYIuyAIwjpDhF0QBGGdIcIuCIKwzhBhFwRBWGcorfXavZlSY8BwDd+iBxiv4f6rjcRbOxopVmiseBspVmiseBeLdbfWurfcnaypsNcapdQBrfW+esdRLhJv7WikWKGx4m2kWKGx4q1WrGLFCIIgrDNE2AVBENYZ603Yb613ABUi8daORooVGiveRooVGiveqsS6rjx2QRAEYf1l7IIgCBuehhF2pdRxpdQhpdRBpdSBEo9fp5Sach4/qJT6Y89jNyilnlBKPa2U+rBP4v1DT6yPKKVySqmucl5bg1g7lFLfVEo9rpR6TCl1bdHjSin1t87397BS6nmex96ulHrK+ff2WsdaZrz9TpyHlFI/V0pd6XnMb9+t347b5eL1xXGrlLrEE8dBpdS0UuqDRc/xzXFbZrzVO2611g3xDzgO9Czx+HXA90tsDwJHgAuBCPAQ8Ox6x1v03NcCd67ktVWK9XbgXc7tCNBR9PiNwA8BBbwQuMfZ3gUcdf7vdG53+iDeF5k4gFebeH363frtuF0y3qLn1vW4LfquzmDXevv2uC0j3qodtw2Tsa+Ca4CntdZHtdZp4OvATXWOqZi3AF+rxxsrpdqBlwG3AWit01rryaKn3QR8Wdv8EuhQSm0Ffg24Q2t9Tmt9HrgDuKHe8Wqtf+7EA/BLoC7r+5X53S7Gmh+3K4i3bsdtEa8Ajmitiyc/+ua4LSfeah63jSTsGvgPpdT9SqmBRZ5zrVLqIaXUD5VSz3G2bQee8TznhLOt1pQTL0qpOPZB9a1KX1slLgDGgC8ppR5USn1BKdVc9JzFvsN6fLflxOvlndhZm8Fv3y3457gt+7v1wXHr5c2UPsH46bj1sli8XlZ13DaSsL9Ea/087EuU9yqlXlb0+APYlzZXAn8H/OtaB1jEcvEaXgv8TGt9bgWvrQYh4HnA57TWVwFzwJr4uSuk7HiVUi/H/oHc4tnst+/WT8dtJcdCvY9bAJRSEeB1wL/U+r2qQTnxVuO4bRhh11qfdP4fBb6DfanqfXxaaz3r3P4BEFZK9QAngZ2ep+5wttU1Xg8Lzt4VvLYanABOaK3vce5/E/vH7WWx77Ae32058aKUugL4AnCT1nrCbPfbd+uz47as79ah3set4dXAA1rrsyUe89Nxa1gq3qodtw0h7EqpZqVUq7kN/CrwSNFztiillHP7GuzPNgHcB+xRSl3gnC3fDHyv3vE6j7UDvwJ8t9LXVgut9RngGaXUJc6mVwCHi572PeB3nCqDFwJTWuvTwI+AX1VKdSqlOp1Yf1SrWMuNVym1C/g28Dat9ZOe7b77bv103JZ5LPjiuPWwlM/vm+PWw6LxVvW4XauR4NX8w64MeMj59ygw6Gx/D/Ae5/b7nMcewh54eJHn9TcCT2JXGQz6IV7n/u8CXy/ntTWOdy9wAHgY2wroLPpuFfBZ5/s7BOzzvPYdwNPOv99bo+NhuXi/AJwHDjr/Dvj4u/XNcVtOvD47bpuxT4Ltnm1+Pm6Xi7dqx63MPBUEQVhnNIQVIwiCIJSPCLsgCMI6Q4RdEARhnSHCLgiCsM4QYRcEQVhniLALgiCsM0TYBUEQ1hki7IIgCOuM/x9Tw4zy+gK15wAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x1135044e0>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import numpy as np\n",
    "from sklearn.gaussian_process import GaussianProcessRegressor\n",
    "from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C\n",
    "from sklearn import datasets\n",
    "import matplotlib.pyplot as plt\n",
    "from matplotlib import gridspec\n",
    "%matplotlib inline\n",
    "\n",
    "boston_houses = datasets.load_boston() # Load the boston housing price dataset\n",
    "avg_rooms = boston_houses.data[:20, np.newaxis, 5] # load the average number of rooms for the first 20 records\n",
    "price = boston_houses.target[:20] # load the price for the first 20 records\n",
    "\n",
    "# initialize a model\n",
    "model = GaussianProcessRegressor(\n",
    "    kernel=C(1.0, (1e-3, 1e3)) * RBF(10, (1e-2, 1e2)),\n",
    "    normalize_y=True)\n",
    "\n",
    "# fit the model\n",
    "model.fit(avg_rooms, price)\n",
    "X = np.linspace(min(avg_rooms), max(avg_rooms), 1000).reshape(1000,1)\n",
    "preds = model.predict(X)\n",
    "\n",
    "plt.scatter(avg_rooms, price,  color='black')\n",
    "plt.plot(X, preds)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This model goes through every point but wouldn't generalize well to other points.\n",
    "\n",
    "A model like that does well on the training data but doesn't generalize is doing something known as **overfitting**.\n",
    "\n",
    "Overfitting is an incredibly common problem in machine learning.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test/Train Splits \n",
    "\n",
    "We held out around 30% of our tweets to test on.  But we only have around 9000 tweets.  Two questions to ask yourself:\n",
    "\n",
    "1. Why might we get unreliable accuracy measurements if we held out 90% of our data as test data?\n",
    "2. Why might we get unreliable accuracy measurements if we held out only 1% of our data as test data?\n",
    "\n",
    "Pause for a second and think about this before reading on.  Test your understanding of the code by trying these experiments.\n",
    "\n",
    "If our held out testing set is too big our model doesn't have enough data to train on, so it will probably perform worse.  \n",
    "\n",
    "If our held out testing set is too small the measurement will be too noisy - by chance we might get a lot right or a lot wrong.  A 70/30 test train split for smaller data sets is common.  As data sets get bigger it's ok to hold out less data as a percentage."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cross Validation\n",
    "\n",
    "The best way to efficiently use all of our data to measure a model's accuracy is a technique called cross validation.  The way it works is we randomly shuffle our data and then divide it into say 5 equally sized chunks which we call folds.  We hold the first fold out and train on the other folds.  We measure the accuracy of our model on the first fold.  Then we hold out the second fold and train an *entirely new model* and measure its accuracy on the second fold.  We hold out each fold and check its accuracy and then we average all of the accuracies together.  \n",
    "\n",
    "I find it easier to understand with a diagram.\n",
    "\n",
    "<img src=\"images/cross-validation.png\" width=\"600\"/>\n",
    "\n",
    "It's easy to do this in code with the scikit-learn library.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy [ 0.65824176  0.63076923  0.60659341  0.60879121  0.64395604  0.68901099\n",
      "  0.70077008  0.66886689  0.65270121  0.62183021]\n",
      "Average Accuracy 0.648153102333\n"
     ]
    }
   ],
   "source": [
    "from sklearn.model_selection import cross_val_score\n",
    "\n",
    "# we pass in model, feature-vector, sentiment-labels and set the number of folds to 10\n",
    "scores = cross_val_score(nb, counts, fixed_target, cv=10)\n",
    "print(\"Accuracy\", scores)\n",
    "print(\"Average Accuracy\", scores.mean())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It looks like our accuracy is closer to 65%.  Do you think this is good or bad?  \n",
    "\n",
    "## Baselines\n",
    "\n",
    "On some tasks, like predicting if the stock market will go up tomorrow or whether a roulette wheel will come up black on the next spin, a 65% accurate model might be incredibly effective and make us rich.  On other tasks like predicting if there will be an earthquake tomorrow, a 65% accurate model could be embarrasingly bad.\n",
    "\n",
    "A very important thing to consider when we evaluate the performance of our model is how well a very simple model would do.  The simplest model would just guess randomly \"No Emotion\", \"Positive\", \"Negative\" or \"Can't Tell\" and have 25% accuracy.  A slightly better model would always guess the most common sentiment.\n",
    "\n",
    "We can use scikit-learns dummy classifiers as baselines to compare against.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 0.59230769  0.59230769  0.59230769  0.59230769  0.59230769  0.59230769\n",
      "  0.5929593   0.5929593   0.59316428  0.59316428]\n",
      "0.592609330138\n"
     ]
    }
   ],
   "source": [
    "# Train with this data with a dummy classifier:\n",
    "from sklearn.dummy import DummyClassifier\n",
    "nb = DummyClassifier(strategy='most_frequent')\n",
    "\n",
    "from sklearn.model_selection import cross_val_score\n",
    "\n",
    "scores = cross_val_score(nb, counts, fixed_target, cv=10)\n",
    "print(scores)\n",
    "print(scores.mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The dummy classifier has 59% accuracy, so our model is around six absolute percentage points better than picking the most common category!  It doesn't seem like a lot but I can't tell you the number of times I've seen models that did worse than a dummy classifier."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Takeaways:\n",
    "1. When building a machine learning model, get to the evaluation step as quickly as possible.\n",
    "2. Always compare your model's performance against a baseline.\n",
    "3. If you don't have an infinite amount of data, use Cross-Validation to evaluate performance.\n",
    "\n",
    "## Questions\n",
    "\n",
    "1. Imagine we were trying to predict whether or not an earthquake was going to happen tomorrow - what would the baseline accuracy be?\n",
    "2. In what scenario would you want to do more folds of cross-validation?  When would you want to do less?\n",
    "3. Why was the accuray of the dummy classifier consistently 59% while the model we build had accuracy from 60-70%?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
